bot.zen @ EVALITA 2016 - A minimally-deep learning PoS-tagger (trained for Italian Tweets)
نویسنده
چکیده
English. This article describes the system that participated in the POS tagging for Italian Social Media Texts (PoSTWITA) task of the 5th periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language EVALITA 2016. The work is a continuation of Stemle (2016) with minor modifications to the system and different data sets. It combines a small assertion of trending techniques, which implement matured methods, from NLP and ML to achieve competitive results on PoS tagging of Italian Twitter texts; in particular, the system uses word embeddings and character-level representations of word beginnings and endings in a LSTM RNN architecture. Labelled data (Italian UD corpus, DiDi and PoSTWITA) and unlabbelled data (Italian C4Corpus and PAISÀ) were used for training. The system is available under the APLv2 open-source license. Italiano. Questo articolo descrive il sistema che ha partecipato al task POS tagging for Italian Social Media Texts (PoSTWita) nell’ambito di EVALITA 2016, la 5° campagna di valutazione periodica del Natural Language Processing (NLP) e delle tecnologie del linguaggio. Il lavoro è un proseguimento di quanto descritto in Stemle (2016), con modifiche minime al sistema e insiemi di dati differenti. Il lavoro combina alcune tecniche correnti che implementano metodi comprovati dell’NLP e del Machine Learning, per raggiungere risultati competitivi nel PoS tagging dei testi italiani di Twitter. In particolare il sistema utilizza strategie di word embedding e di rappresentazione character-level di inizio e fine parola, in un’architettura LSTM RNN. Dati etichettati (Italian UD corpus, DiDi e PoSTWITA) e dati non etichettati (Italian C4Corpus e PAISÀ) sono stati utilizzati in
منابع مشابه
Character Embeddings PoS Tagger vs HMM Tagger for Tweets
English. The paper describes our submissions to the task on PoS tagging for Italian Social Media Texts (PoSTWITA) at Evalita 2016. We compared two approaches: a traditional HMM trigram Pos tagger and a Deep Learning PoS tagger using both character-level and word-level embeddings. The character-level embeddings performed better proving that they can provide a finer representation of words that a...
متن کاملA BiLSTM-CRF PoS-tagger for Italian tweets using morphological information
English. This paper presents some experiments for the construction of an highperformance PoS-tagger for Italian using deep neural networks techniques (DNN) integrated with an Italian powerful morphological analyser that has been applied to tag Italian tweets. The proposed system ranked third at the EVALITA2016PoSTWITA campaign. Italiano. Questo contributo presenta alcuni esperimenti per la cost...
متن کاملbot.zen $@$ EmpiriST 2015 - A minimally-deep learning PoS-tagger (trained for German CMC and Web data)
This article describes the system that participated in the Part-of-speech tagging subtask of the EmpiriST 2015 shared task on automatic linguistic annotation of computer-mediated communication / so-
متن کامل(Better than) State-of-the-Art PoS-tagging for Italian Texts
English. This paper presents some experiments for the construction of an highperformance PoS-tagger for Italian using deep neural networks techniques (DNN) integrated with an Italian powerful morphological analyser. The results obtained by the proposed system on standard datasets taken from the EVALITA campaigns show large accuracy improvements when compared with previous systems from the liter...
متن کاملMivoq Evalita 2016 PosTwITA tagger
English. The POS tagger developed by Mivoq to tag tweets according to PosTwITA task guidelines as defined at Evalita 2016 is presented. The system obtained third position with 92.7% of accuracy. Italiano. Si presenta il POS tagger sviluppato da Mivoq per etichettare i tweet secondo le linee guida del task PosTwITA, cos come definite per Evalita 2016. Il sistema ha ottenuto la terza posizione co...
متن کامل